Preliminary analysis to inform variables for inclusion in the Bayesian occupancy model. Candidate variables include:
Occupancy variables:
Note, X, Y, are also available. Kebele is available but as not all kebeles are sampled this would need to be a random effect.
Observation variables:
Note, further observation variables (canopy, understory, etc.) were collected but had too many missing variables to include in this instance. A more detailed habitat_type is also available.
This analysis will:
Data are all cleaned and prepared in the sourced R script:
source("/Users/elaw/Desktop/LeuphanaProject/BirdModelling/ETH_birds/R/GLMandRFanalysis/GLMandRFanalysis_Data_v2.R")
This includes separate farm and forest data frames. All sites have 2 rounds of sampling. Farm data have 80 sites, and Forest has 65 sites. Start time and date were converted to numeric before being centered and scaled. All other continuous variables (including ordinal/count variables n_observers, visibility, and cloud, but excluding X,Y) will need to be scaled and centered (after transformation).
frmSR %>% summary()
## point_id round SR habitat n_observers
## Length:160 1:80 Min. : 2.000 FARM:160 Min. :1.000
## Class :character 2:80 1st Qu.: 5.750 FOR : 0 1st Qu.:1.000
## Mode :character Median : 8.000 Median :1.000
## Mean : 7.969 Mean :1.156
## 3rd Qu.:10.000 3rd Qu.:1.000
## Max. :18.000 Max. :3.000
## recording start date visibility
## 0: 11 Min. :-1.86057 Min. :-1.784218 Min. :0.000
## 1:149 1st Qu.:-0.85994 1st Qu.:-0.681072 1st Qu.:2.000
## Median :-0.20436 Median : 0.095926 Median :2.000
## Mean :-0.05453 Mean : 0.001199 Mean :2.062
## 3rd Qu.: 0.81007 3rd Qu.: 1.016812 3rd Qu.:3.000
## Max. : 1.93491 Max. : 1.515626 Max. :5.000
## cloud kebele hab_details X
## Min. :0.000 Length:160 Length:160 Min. :36.05
## 1st Qu.:0.000 Class :character Class :character 1st Qu.:36.23
## Median :2.000 Mode :character Mode :character Median :36.40
## Mean :1.656 Mean :36.34
## 3rd Qu.:3.000 3rd Qu.:36.42
## Max. :3.000 Max. :36.47
## Y elevation slope twi farm_type
## Min. :7.569 Min. :1555 Min. : 0.3981 Min. : 4.648 0: 20
## 1st Qu.:7.874 1st Qu.:1774 1st Qu.: 6.0388 1st Qu.: 5.462 1:140
## Median :7.923 Median :2024 Median : 8.6846 Median : 6.035
## Mean :7.908 Mean :2018 Mean :10.3337 Mean : 6.648
## 3rd Qu.:8.009 3rd Qu.:2236 3rd Qu.:13.7028 3rd Qu.: 7.447
## Max. :8.059 Max. :2439 Max. :30.9040 Max. :13.850
## sidi1ha sidi200 pwv1ha pwv500
## Min. :0.00000 Min. :0.0919 Min. : 0.000 Min. : 0.6119
## 1st Qu.:0.06679 1st Qu.:0.3095 1st Qu.: 0.000 1st Qu.: 5.8891
## Median :0.17883 Median :0.5165 Median : 0.000 Median : 9.2416
## Mean :0.22413 Mean :0.4576 Mean : 2.346 Mean :17.3681
## 3rd Qu.:0.34480 3rd Qu.:0.6014 3rd Qu.: 1.543 3rd Qu.:31.3321
## Max. :0.66663 Max. :0.7683 Max. :27.160 Max. :61.0962
## fl_dis fl_dis85
## Min. : 30.00 Min. : 0.00
## 1st Qu.: 89.13 1st Qu.: 72.28
## Median : 189.87 Median : 370.90
## Mean : 297.61 Mean : 466.88
## 3rd Qu.: 418.39 3rd Qu.: 733.33
## Max. :1210.17 Max. :1960.87
fstSR %>% summary()
## point_id round SR habitat n_observers
## Length:130 1:65 Min. : 2.000 FARM: 0 Min. :1.000
## Class :character 2:65 1st Qu.: 5.000 FOR :130 1st Qu.:1.000
## Mode :character Median : 7.000 Median :1.000
## Mean : 7.485 Mean :1.177
## 3rd Qu.:10.000 3rd Qu.:1.000
## Max. :13.000 Max. :3.000
## recording start date visibility
## 0: 0 Min. :-1.86057 Min. :-1.784218 Min. :0.000
## 1:130 1st Qu.:-0.75643 1st Qu.:-0.661887 1st Qu.:2.000
## Median :-0.06634 Median : 0.000000 Median :2.000
## Mean : 0.06711 Mean :-0.001476 Mean :1.915
## 3rd Qu.: 0.82387 3rd Qu.: 0.824961 3rd Qu.:2.000
## Max. : 2.55598 Max. : 1.592366 Max. :3.000
## cloud kebele hab_details X
## Min. :0.000 Length:130 Length:130 Min. :36.06
## 1st Qu.:0.000 Class :character Class :character 1st Qu.:36.10
## Median :2.000 Mode :character Mode :character Median :36.20
## Mean :1.469 Mean :36.24
## 3rd Qu.:2.000 3rd Qu.:36.36
## Max. :3.000 Max. :36.48
## Y elevation slope twi forest_type
## Min. :7.570 Min. :1578 Min. : 2.142 Min. : 4.354 0:38
## 1st Qu.:7.623 1st Qu.:1776 1st Qu.: 9.578 1st Qu.: 5.170 1:92
## Median :7.683 Median :1943 Median :12.264 Median : 5.683
## Mean :7.788 Mean :1980 Mean :14.235 Mean : 6.230
## 3rd Qu.:7.990 3rd Qu.:2083 3rd Qu.:19.426 3rd Qu.: 7.023
## Max. :8.060 Max. :2547 Max. :31.282 Max. :11.287
## fr_dis fr_dis85 hli pwv2km
## Min. : 10.00 Min. : 0.0 Min. :1.053 Min. :10.26
## 1st Qu.: 82.46 1st Qu.: 0.0 1st Qu.:1.110 1st Qu.:57.28
## Median :219.54 Median :114.0 Median :1.132 Median :77.89
## Mean :274.87 Mean :211.5 Mean :1.129 Mean :70.01
## 3rd Qu.:435.66 3rd Qu.:325.6 3rd Qu.:1.150 3rd Qu.:86.12
## Max. :910.22 Max. :750.0 Max. :1.192 Max. :93.87
frmSR <- frmSR %>%
mutate(habitat2 = map_chr(hab_details, ~ ifelse(.x %>% str_starts("F"), "CR", .x)))
fstSR <- fstSR %>%
mutate(habitat2 = map_chr(hab_details, ~ ifelse(.x %>% str_starts("C"), "CF", "NCF")))
Bivariate relationships are shown with loess (y~x, red), linear (y~x, blue) and quadratic (y~poly(x,2), yellow) lines, or boxplots where appropriate.
Farmland bivariate relationships for occupancy variables
Discussion of Farmland occupancy variables:
We may want to be careful with pwv1ha (due to skew, although relationship is our expected direction here), and twi (because of skew and strength of the relationship that we don’t really expect).
We may want to consider transforming slope, twi, sidi1ha, pwv500, fl_dis, and fl_dis85.
Farmland bivariate relationships for observation variables
Discussion of Farmland observation variables:
Forest bivariate relationships for occupancy variables
Discussion of forest occupancy variables:
Consider transforming elevation, twi, distance variables, and pwv2km.
Forest bivariate relationships for observation variables
Discussion of farmland observation variables:
Variables we’d like to consider transforming include:
Potential transformations assessed using
bestNormalize::bestNormalize, preferring conventional
transformations where these were close to optimal
Apply transformations:
frmSR <- frmSR %>%
mutate(
slope = sqrt(slope),
twi = log10(twi+1),
sidi1ha = sqrt(sidi1ha),
pwv500 = log10(pwv500+1),
fl_dis = sqrt(fl_dis),
fl_dis85 = sqrt(fl_dis85),
visibility = visibility > 1
)
fstSR <- fstSR %>%
mutate(
elevation = log10(elevation+1),
twi = log10(twi+1),
fr_dis = sqrt(fr_dis),
fr_dis85 = sqrt(fr_dis85),
n_observers = n_observers > 1,
visibility = visibility > 1,
cloud = cloud > 1
)
Transformed farmland vars:
Farmland bivariate relationships for transformed variables
Discussion: all of these suggest a linear trend (not enough difference/data to support curves at extremes). twi remains quite skewed and should be treated with caution.
Transformed forest vars:Forest bivariate relationships for transformed variables
Discussion:
All other continuous variables (excluding binary, X,Y) will need to be scaled and centrered in order to improve comparability in the GLMs. Also need to remove those we identified as being problematic, to prevent their further use.
frmSR <- frmSR %>%
# select(-pwv1ha, -twi) %>%
select(-n_observers, -recording, -cloud) %>%
mutate(across(c(start, date, elevation, slope, twi, sidi1ha, sidi200, pwv1ha, pwv500, fl_dis, fl_dis85),
~scale(.x)[,1]))
fstSR <- fstSR %>%
select(-recording) %>%
mutate(across(c(start, date, elevation, slope, twi, fr_dis, fr_dis85, hli, pwv2km),
~scale(.x)[,1]))
These show Pearson correlation using
corrplot::corrplot
Farmland variables - Pearson correlation
Discussion of farmland correlations:
Suggest removing round, sidi200, pwv500, twi
Forest variables - Pearson correlation
Discussion:
Suggest removing round, fr_dis85, and pwv2km. Possibly also twi because of uncertain trend/spread.
Variables: bold are retained
Variables: bold are retained Suggest removing round, fr_dis85, and pwv2km.
Starting from the full set of variables preselected above, conduct backwards step selection using AIC. Initial trial with poisson with log link (glm(family=poisson)) moving to negative binomial (glmmTMB(family=negbin2)) if over dispersed.
# Full farm formula
fffrm <- formula(SR ~ #habitat2 +
elevation + slope + farm_type + sidi1ha + fl_dis +
poly(start,2) + date + visibility)
# Full forest formula
fffst <- formula(SR ~ #habitat2 +
elevation + slope + forest_type + fr_dis + hli +
poly(start,2) + date + n_observers + visibility + cloud)
m0frm <- glm(formula = fffrm,
family = poisson,
data = frmSR)
AER::dispersiontest(m0frm) # ok
##
## Overdispersion test
##
## data: m0frm
## z = 0.068533, p-value = 0.4727
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 1.00724
summary(m0frm)
##
## Call:
## glm(formula = fffrm, family = poisson, data = frmSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.73926 -0.75673 -0.07539 0.62989 2.52268
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.29952 0.09750 23.585 < 2e-16 ***
## elevation -0.04928 0.03364 -1.465 0.14300
## slope -0.03453 0.02917 -1.183 0.23663
## farm_type1 -0.23225 0.08079 -2.875 0.00404 **
## sidi1ha 0.03321 0.03137 1.059 0.28971
## fl_dis -0.04836 0.03216 -1.504 0.13270
## poly(start, 2)1 0.39163 0.40079 0.977 0.32849
## poly(start, 2)2 -2.16105 0.43050 -5.020 5.17e-07 ***
## date 0.09698 0.03223 3.009 0.00262 **
## visibilityTRUE -0.06135 0.08001 -0.767 0.44320
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 230.75 on 159 degrees of freedom
## Residual deviance: 165.66 on 150 degrees of freedom
## AIC: 799.77
##
## Number of Fisher Scoring iterations: 4
m0frm.s <- MASS::stepAIC(m0frm)
## Start: AIC=799.77
## SR ~ elevation + slope + farm_type + sidi1ha + fl_dis + poly(start,
## 2) + date + visibility
##
## Df Deviance AIC
## - visibility 1 166.24 798.36
## - sidi1ha 1 166.78 798.90
## - slope 1 167.06 799.18
## <none> 165.66 799.77
## - elevation 1 167.81 799.92
## - fl_dis 1 167.93 800.04
## - farm_type 1 173.55 805.67
## - date 1 174.77 806.89
## - poly(start, 2) 2 192.66 822.78
##
## Step: AIC=798.36
## SR ~ elevation + slope + farm_type + sidi1ha + fl_dis + poly(start,
## 2) + date
##
## Df Deviance AIC
## - sidi1ha 1 167.47 797.58
## - slope 1 167.71 797.82
## <none> 166.24 798.36
## - fl_dis 1 168.28 798.39
## - elevation 1 168.47 798.58
## - farm_type 1 174.05 804.16
## - date 1 174.80 804.92
## - poly(start, 2) 2 193.40 821.52
##
## Step: AIC=797.58
## SR ~ elevation + slope + farm_type + fl_dis + poly(start, 2) +
## date
##
## Df Deviance AIC
## - fl_dis 1 169.46 797.58
## <none> 167.47 797.58
## - slope 1 169.78 797.89
## - elevation 1 171.09 799.21
## - date 1 175.80 803.92
## - farm_type 1 176.70 804.81
## - poly(start, 2) 2 196.05 822.17
##
## Step: AIC=797.58
## SR ~ elevation + slope + farm_type + poly(start, 2) + date
##
## Df Deviance AIC
## <none> 169.46 797.58
## - slope 1 171.86 797.98
## - elevation 1 177.22 803.34
## - date 1 178.30 804.42
## - farm_type 1 179.20 805.32
## - poly(start, 2) 2 198.38 822.49
AER::dispersiontest(m0frm.s) # ok
##
## Overdispersion test
##
## data: m0frm.s
## z = 0.217, p-value = 0.4141
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 1.023231
summary(m0frm.s)
##
## Call:
## glm(formula = SR ~ elevation + slope + farm_type + poly(start,
## 2) + date, family = poisson, data = frmSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.55309 -0.73695 -0.02547 0.65748 2.57939
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.27265 0.07283 31.206 < 2e-16 ***
## elevation -0.08094 0.02899 -2.792 0.00524 **
## slope -0.04407 0.02849 -1.547 0.12196
## farm_type1 -0.25424 0.07924 -3.208 0.00133 **
## poly(start, 2)1 0.23528 0.39088 0.602 0.54723
## poly(start, 2)2 -2.14839 0.41249 -5.208 1.9e-07 ***
## date 0.09062 0.03061 2.960 0.00307 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 230.75 on 159 degrees of freedom
## Residual deviance: 169.46 on 153 degrees of freedom
## AIC: 797.58
##
## Number of Fisher Scoring iterations: 4
Selected farm model:
m0fst <- glm(formula = fffst,
family = poisson,
data = fstSR)
AER::dispersiontest(m0fst) # ok
##
## Overdispersion test
##
## data: m0fst
## z = -2.787, p-value = 0.9973
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 0.7353065
summary(m0fst)
##
## Call:
## glm(formula = fffst, family = poisson, data = fstSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.61736 -0.58497 0.09967 0.52791 2.02748
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.888824 0.111317 16.968 < 2e-16 ***
## elevation 0.074737 0.037069 2.016 0.0438 *
## slope -0.045704 0.034687 -1.318 0.1876
## forest_type1 0.166732 0.089881 1.855 0.0636 .
## fr_dis 0.024141 0.039619 0.609 0.5423
## hli -0.023714 0.033323 -0.712 0.4767
## poly(start, 2)1 -0.699798 0.443292 -1.579 0.1144
## poly(start, 2)2 -0.565145 0.433450 -1.304 0.1923
## date 0.183204 0.036278 5.050 4.42e-07 ***
## n_observersTRUE -0.079474 0.096839 -0.821 0.4118
## visibilityTRUE 0.002486 0.095980 0.026 0.9793
## cloudTRUE -0.029178 0.071446 -0.408 0.6830
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 163.63 on 129 degrees of freedom
## Residual deviance: 101.82 on 118 degrees of freedom
## AIC: 617.33
##
## Number of Fisher Scoring iterations: 4
m0fst.s <- MASS::stepAIC(m0fst)
## Start: AIC=617.33
## SR ~ elevation + slope + forest_type + fr_dis + hli + poly(start,
## 2) + date + n_observers + visibility + cloud
##
## Df Deviance AIC
## - visibility 1 101.82 615.33
## - cloud 1 101.98 615.50
## - fr_dis 1 102.19 615.70
## - hli 1 102.32 615.83
## - n_observers 1 102.50 616.01
## - slope 1 103.56 617.08
## <none> 101.82 617.33
## - poly(start, 2) 2 106.58 618.09
## - forest_type 1 105.29 618.80
## - elevation 1 105.86 619.38
## - date 1 127.68 641.19
##
## Step: AIC=615.33
## SR ~ elevation + slope + forest_type + fr_dis + hli + poly(start,
## 2) + date + n_observers + cloud
##
## Df Deviance AIC
## - cloud 1 101.99 613.50
## - fr_dis 1 102.19 613.70
## - hli 1 102.32 613.84
## - n_observers 1 102.51 614.02
## - slope 1 103.57 615.09
## <none> 101.82 615.33
## - poly(start, 2) 2 106.68 616.20
## - forest_type 1 105.29 616.81
## - elevation 1 105.94 617.46
## - date 1 129.67 641.19
##
## Step: AIC=613.5
## SR ~ elevation + slope + forest_type + fr_dis + hli + poly(start,
## 2) + date + n_observers
##
## Df Deviance AIC
## - fr_dis 1 102.30 611.81
## - hli 1 102.58 612.09
## - n_observers 1 102.80 612.32
## - slope 1 103.67 613.18
## <none> 101.99 613.50
## - poly(start, 2) 2 107.15 614.66
## - forest_type 1 105.38 614.89
## - elevation 1 107.01 616.52
## - date 1 129.98 639.50
##
## Step: AIC=611.81
## SR ~ elevation + slope + forest_type + hli + poly(start, 2) +
## date + n_observers
##
## Df Deviance AIC
## - hli 1 102.78 610.30
## - n_observers 1 103.04 610.56
## - slope 1 104.12 611.64
## <none> 102.30 611.81
## - poly(start, 2) 2 107.45 612.97
## - elevation 1 107.79 615.31
## - forest_type 1 107.82 615.34
## - date 1 130.50 638.01
##
## Step: AIC=610.3
## SR ~ elevation + slope + forest_type + poly(start, 2) + date +
## n_observers
##
## Df Deviance AIC
## - n_observers 1 103.62 609.13
## <none> 102.78 610.30
## - slope 1 104.90 610.41
## - poly(start, 2) 2 107.80 611.31
## - forest_type 1 108.00 613.51
## - elevation 1 108.56 614.08
## - date 1 130.77 636.29
##
## Step: AIC=609.13
## SR ~ elevation + slope + forest_type + poly(start, 2) + date
##
## Df Deviance AIC
## - slope 1 105.47 608.99
## <none> 103.62 609.13
## - poly(start, 2) 2 109.30 610.81
## - forest_type 1 108.46 611.97
## - elevation 1 110.09 613.60
## - date 1 132.10 635.61
##
## Step: AIC=608.99
## SR ~ elevation + forest_type + poly(start, 2) + date
##
## Df Deviance AIC
## <none> 105.47 608.99
## - forest_type 1 109.71 611.22
## - elevation 1 110.89 612.41
## - poly(start, 2) 2 112.90 612.41
## - date 1 133.60 635.11
AER::dispersiontest(m0fst.s) # ok
##
## Overdispersion test
##
## data: m0fst.s
## z = -2.4729, p-value = 0.9933
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 0.7631657
summary(m0fst.s)
##
## Call:
## glm(formula = SR ~ elevation + forest_type + poly(start, 2) +
## date, family = poisson, data = fstSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3952 -0.6878 0.1326 0.5454 2.0767
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.86907 0.06657 28.079 < 2e-16 ***
## elevation 0.07926 0.03391 2.338 0.0194 *
## forest_type1 0.16022 0.07863 2.038 0.0416 *
## poly(start, 2)1 -0.85426 0.39031 -2.189 0.0286 *
## poly(start, 2)2 -0.63185 0.41042 -1.540 0.1237
## date 0.18423 0.03506 5.254 1.49e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 163.63 on 129 degrees of freedom
## Residual deviance: 105.47 on 124 degrees of freedom
## AIC: 608.99
##
## Number of Fisher Scoring iterations: 4
Selected forest model:
Random forest models, using the default parameters in the randomForest implementation, selecting variables with %IncMSE of < 5. Because this is a random process, we use 5 set.seeds, and average the results.
For the random forest models these need to remove the poly terms
# Full farm formula
fffrm2 <- formula(SR ~
elevation + slope + farm_type + sidi1ha + fl_dis +
start + date + visibility)
# Full forest formula
fffst2 <- formula(SR ~
elevation + slope + forest_type + fr_dis + hli +
start + date + n_observers + visibility + cloud)
myRF <- function(seed, .formula, .data){
set.seed(seed)
nme <- paste0("%IncMSE", seed)
rf <- randomForest::randomForest(formula = .formula,
data = .data,
importance = TRUE)
imp <- randomForest::importance(rf)
imp %>%
as_tibble %>%
add_column(variable = dimnames(imp)[[1]], .before = 1) %>%
select(-IncNodePurity) %>%
arrange(variable)
}
myRFset <- function(seeds, formula, data){
map_dfc(seeds, ~myRF(.x, formula, data)) %>%
mutate(`mean%IncMSE` = rowMeans(across(where(is.numeric)))) %>%
select(variable = variable...1, `mean%IncMSE`) %>%
arrange(-`mean%IncMSE`)
}
myRFset(1:5, fffrm2, frmSR)
## # A tibble: 8 × 2
## variable `mean%IncMSE`
## <chr> <dbl>
## 1 date 14.6
## 2 start 7.84
## 3 fl_dis 7.66
## 4 elevation 7.03
## 5 farm_type 5.16
## 6 sidi1ha 4.35
## 7 slope 0.467
## 8 visibility -0.819
RF selected farm variables: * [>5%IncMSE]: date, start, fl_dis, elevation, farm_type * [<5%IncMSE]: sidi1ha * [<1%IncMSE]: slope, visibility
myRFset(1:5, fffst2, fstSR)
## # A tibble: 10 × 2
## variable `mean%IncMSE`
## <chr> <dbl>
## 1 date 31.7
## 2 elevation 9.24
## 3 forest_type 5.45
## 4 fr_dis 5.43
## 5 hli 4.29
## 6 start 2.47
## 7 n_observers 1.93
## 8 cloud 0.865
## 9 visibility 0.566
## 10 slope -1.08
RF selected forest variables: * [>5%IncMSE]: date, elevation, fr_dis, forest_type * [<5%IncMSE]: hli, start, twi, cloud, * [<1%IncMSE]: n_observers, visibility, slope
Starting from the set of variables selected in the RF, conduct backwards step selection using AIC. Initial trial with poisson with log link (glm(family=poisson)) moving to negative binomial (glmmTMB(family=negbin2)) if overdispersed.
# Full farm formula
fffrm3 <- formula(SR ~
elevation + fl_dis + farm_type +
date + poly(start,2))
# Full forest formula
fffst3 <- formula(SR ~
elevation + fr_dis + forest_type +
date)
m2frm <- glm(formula = fffrm3,
family = poisson,
data = frmSR)
AER::dispersiontest(m2frm) # ok
##
## Overdispersion test
##
## data: m2frm
## z = 0.25885, p-value = 0.3979
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 1.028516
summary(m2frm)
##
## Call:
## glm(formula = fffrm3, family = poisson, data = frmSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.85952 -0.73446 -0.05434 0.63243 2.55008
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.27035 0.07288 31.151 < 2e-16 ***
## elevation -0.05683 0.03208 -1.772 0.07645 .
## fl_dis -0.04619 0.03204 -1.442 0.14942
## farm_type1 -0.25130 0.07928 -3.170 0.00153 **
## date 0.09112 0.03065 2.973 0.00295 **
## poly(start, 2)1 0.35208 0.38989 0.903 0.36651
## poly(start, 2)2 -2.08370 0.41036 -5.078 3.82e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 230.75 on 159 degrees of freedom
## Residual deviance: 169.78 on 153 degrees of freedom
## AIC: 797.89
##
## Number of Fisher Scoring iterations: 4
m2frm.s <- MASS::stepAIC(m2frm)
## Start: AIC=797.89
## SR ~ elevation + fl_dis + farm_type + date + poly(start, 2)
##
## Df Deviance AIC
## <none> 169.78 797.89
## - fl_dis 1 171.86 797.98
## - elevation 1 172.93 799.04
## - date 1 178.69 804.81
## - farm_type 1 179.29 805.41
## - poly(start, 2) 2 197.66 821.77
AER::dispersiontest(m2frm.s) # ok
##
## Overdispersion test
##
## data: m2frm.s
## z = 0.25885, p-value = 0.3979
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 1.028516
summary(m2frm.s)
##
## Call:
## glm(formula = SR ~ elevation + fl_dis + farm_type + date + poly(start,
## 2), family = poisson, data = frmSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.85952 -0.73446 -0.05434 0.63243 2.55008
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.27035 0.07288 31.151 < 2e-16 ***
## elevation -0.05683 0.03208 -1.772 0.07645 .
## fl_dis -0.04619 0.03204 -1.442 0.14942
## farm_type1 -0.25130 0.07928 -3.170 0.00153 **
## date 0.09112 0.03065 2.973 0.00295 **
## poly(start, 2)1 0.35208 0.38989 0.903 0.36651
## poly(start, 2)2 -2.08370 0.41036 -5.078 3.82e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 230.75 on 159 degrees of freedom
## Residual deviance: 169.78 on 153 degrees of freedom
## AIC: 797.89
##
## Number of Fisher Scoring iterations: 4
Selected farm model:
m2fst <- glm(formula = fffst3,
family = poisson,
data = fstSR)
AER::dispersiontest(m2fst) # ok
##
## Overdispersion test
##
## data: m2fst
## z = -1.7812, p-value = 0.9626
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 0.8198381
summary(m2fst)
##
## Call:
## glm(formula = fffst3, family = poisson, data = fstSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3850 -0.7216 0.1520 0.5802 2.2257
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.88370 0.07261 25.942 < 2e-16 ***
## elevation 0.06747 0.03380 1.996 0.0459 *
## fr_dis 0.01849 0.03809 0.485 0.6274
## forest_type1 0.14425 0.08892 1.622 0.1048
## date 0.18867 0.03440 5.484 4.15e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 163.63 on 129 degrees of freedom
## Residual deviance: 112.66 on 125 degrees of freedom
## AIC: 614.18
##
## Number of Fisher Scoring iterations: 4
m2fst.s <- MASS::stepAIC(m2fst)
## Start: AIC=614.18
## SR ~ elevation + fr_dis + forest_type + date
##
## Df Deviance AIC
## - fr_dis 1 112.90 612.41
## <none> 112.66 614.18
## - forest_type 1 115.32 614.83
## - elevation 1 116.61 616.12
## - date 1 143.16 642.68
##
## Step: AIC=612.41
## SR ~ elevation + forest_type + date
##
## Df Deviance AIC
## <none> 112.90 612.41
## - elevation 1 117.10 614.62
## - forest_type 1 117.44 614.96
## - date 1 143.89 641.40
AER::dispersiontest(m2fst.s) # ok
##
## Overdispersion test
##
## data: m2fst.s
## z = -1.783, p-value = 0.9627
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 0.8208674
summary(m2fst.s)
##
## Call:
## glm(formula = SR ~ elevation + forest_type + date, family = poisson,
## data = fstSR)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3554 -0.7543 0.1550 0.6110 2.1909
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.86928 0.06635 28.172 < 2e-16 ***
## elevation 0.06910 0.03356 2.059 0.0395 *
## forest_type1 0.16484 0.07810 2.111 0.0348 *
## date 0.18958 0.03432 5.524 3.32e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 163.63 on 129 degrees of freedom
## Residual deviance: 112.90 on 126 degrees of freedom
## AIC: 612.41
##
## Number of Fisher Scoring iterations: 4
Selected forest model:
Starting from the full model, use dredge to identify the top 10% of models. Conduct importance on these, selecting variables in at least 1/3 of the top models.
d0frm <- MuMIn::dredge(m0frm)
## Fixed term is "(Intercept)"
d2frm <- d0frm[i=1:(0.1*nrow(d0frm)), recalc.weights = TRUE, recalc.delta = FALSE]
MuMIn::sw(d2frm)
## date farm_type poly(start, 2) elevation fl_dis sidi1ha
## Sum of weights: 1.00 1.00 1.00 0.70 0.62 0.49
## N containing models: 25 25 25 16 16 13
## slope visibility
## Sum of weights: 0.44 0.30
## N containing models: 12 12
nrow(d2frm)/3
## [1] 8.333333
Farm variables retained: date farm_type poly(start, 2) elevation fl_dis sidi1ha slope visibility
d0fst <- MuMIn::dredge(m0fst)
## Fixed term is "(Intercept)"
d2fst <- d0fst[i=1:(0.1*nrow(d0fst)), recalc.weights = TRUE, recalc.delta = FALSE]
MuMIn::sw(d2fst)
## date elevation forest_type poly(start, 2) slope
## Sum of weights: 1.00 0.90 0.81 0.76 0.47
## N containing models: 102 83 77 70 49
## n_observers fr_dis hli cloud visibility
## Sum of weights: 0.26 0.25 0.23 0.21 0.16
## N containing models: 31 32 29 29 21
nrow(d2fst)/3
## [1] 34
Forest variables retained: date elevation forest_type poly(start, 2) slope
Selecting variables for the multi-species occupancy model based on their relationships with species richness is a pragmatic approach. Some of the relationships seen in the species richness may not translate to individual species, but due to the low number of observations, we do need to reduce the number of variables we are estimating.
Farm variables selected by the different approaches:
suggest using:
Forest variables selected by the different approaches:
In the latter, n-observers was also one of the last de-selected. One to consider.
suggest using: